|
In statistics, Cook's distance or Cook's ''D'' is a commonly used estimate of the influence of a data point when performing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977. ==Definition== Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis. For the algebraic expression, first define : as the hat matrix (or projection matrix) of the observations of each explanatory variables. Then let be the OLS estimate of that results from omitting the -th observation (). Then we have : where is the residual (i.e., the difference between the observed value and the value fitted by the proposed model), and , defined as : is the leverage, i.e., the -th diagonal element of . With this, we can define Cook's distance as : where is the number of fitted parameters, and is the mean square error of the regression model. Algebraically equivalent is the following expression : where is the OLS estimate of the variance of the error term, defined as : And a third equivalent expression is : is the prediction for observation ''j'' from a refitted regression model in which observation ''i'' has been omitted; 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Cook's distance」の詳細全文を読む スポンサード リンク
|